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Recognizing that the emotional state of the student 
is integral to his ability to learn, educators now place emphasis on 
testing in the affective domain. With this increasing demand for test 
data, ethical considerations must be taken into account as 
measurement instruments are designed, administered, and interpreted. 
Difficulties in instrument design arise because of the complex and 
multidimensional nature of the affective domain. To date, the most 
useful method of categorizing the emotional state is through an 
assessment of student attitudes, interests, values, and 
appreciations. The most commonly used assessment technique is the 
self-report stimulus response selection approach which may involve a 
format that is forced choice or true-false. Scales include Guttman, 
Likert, Thurstone, and the semantic differential. Of the numerous 
types of item formats and scales, all have complex problems ranging 
from serious validity problems to high costs. Other methods for 
assessing the affective domain are the Q-Sort, interviews,- and 
unobtrusive measures. (BJG) 
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TESTING III T.»r? AFFBCTIVB DOMAIN 
Thomas F. Donlon 

In 19!;1> K* F- Lindquist, writing in tho first edition of Educational Measure*- 
ment, focused" on the need for tests of hitherto unmeasured educational 
objectives. "If the descriptions of educational development of individual 
students provided by tests are to be truly cumprehensive, " he wrote, "tests 
and mcas ^ring devices must be developed for many more educational objectives 
than are now being measured at all. In general, satisfactory tests have thus 
far been developed only for objectives concerned with the student's intellec- 
tual development, 6r with his purely rational behavior. Objectives .oncerned 
with . . \ moral values, attitudes toward social institutions and practices, 
. . . have been seriously neglected in educational measurement" (Lindquist, 
1951). 

By 197 1> when the second edition of Educational Measurement was prepared, 
Krathwohl and Payne (1971) could describe the work on the taxonomy of educa- 
tional ob.ject s: II The affective domain , as evidence of progress and of the 
increased importance of affective goals in education. The Taxonomy, an 
ambitious attempt to structure levels of affective response, indicates the 
validity of Carmen Finley's observation: "In recent years there has been a 
growing awareness of the need for schools to include the affective domain in 
the development of objectives for learning" (Finley, 1975). Or, as Robert 
Strom and E. Paul Torrance have observed, "A decade ago there was less discus- 
sion among educators about the affective domain than there is today . . . 
L there is] an emerging priority for emotional achievement" (Strom & Torrance, 
1975). 
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The r«?asons for this expansion of interest are diverse and numerous. 
Attempts to evaluate 'schools and their functioning, increased efforts at 
accountability, shifts in the responsibilities of schools and families arr 
all factors in the process. A number of major social problems, such as drug 
abuse or the assimilation of minorities, are seen as challenges to affective 
education. Further, affective characteristics are seen not only as the end- 
products of education, but as process characteristics: Too many learning 
problems are traceable to problems of motivation and the self -concept, and 
the schools must confront these dimensions of their pupils. 

The expanded emphasis on the affective domain inevitably brings renewed 
interest in the techniques for instruction and measurement ia this area. 
These techniques, however, are not nearly as well developed as they are for 
the cognitive achievement areas. As Lindquist (1951) observed," . . . attain- 
ment of these objectives is . . . difficult to measure, ... so little is 
Known about how to measure them, just as so little is known about how to teach 
them effectively." The problems Lindquist perceived are far from solved today. 
Nonetheless, there are a number of techniques available, useful in the assess- 
ment of characteristics such as interests, attitudes, and values. While all 
of these approaches are somewhat crude, and while all are vulnerable to dis- 
tortions of inference, they constitute a valuable, resource for educators vho 
establish affective objectives and who seek to measure the attainment of theL>. 

This paper is a brief statement of the major approaches to testing in the v 
affective domain. The emphasis throughout is on paper -and-pencil approaches, 
and on objective strategies. • An effort is made to characterize obse^rvational 
techniques and projective tests, but the major share of the c'iscursion and 
information is devoted to paper -and -pencil approaches, in the belief that 
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th-.T-' have proven over the years to be the most practical methods for educa- 
tional assessment in tV affective domain. 

GENERAL CONSIDERATIONS 

Terminolog y 

There is a large variety of concepts and labels in the affective domain; draw- 
ing precise verbal distinctions among them is simply not possible. Interests, 
attitudes, values, and appreciations hrve been suggested by Tyler (1975^ & s 
the main areas of the affective domain ''hich' are of interest to educators. 
"Personality test" is another term that is widely used and troublesomely 
ambiguous. In general, measurement specialists distinguish it from a test of 
attitudes or interests and reserve it for tests designed to measure persistent 
and emotional characteristics of mental functioning, such as introversion- 
extroversion, or aggress ivity-docility. In this use, the scores on personal- 
ity tests describe general and emotional qualities of the mind. ';2sts of 
-nterests, attitudes, and values, then, are often not called personality tests 
because they have a specific content component external to the person: The 
person is interested in something outside the self, a sport or a book, or the 
person has a negative attitude toward Indonesia. The distinction is logically 
not very clear, however, and in the Seventh Mental Measurements Yearbook 
(Buros, 1972), the category called "Character and Personality" includes the 
well-known Study of Values, which measures broadly general interests or values. 
Similarly, in Anastasi (1968), the discussions of interest and attitude mea- 
sures are included in a section devoted to personality tests. At best, we can 
simply offer some crude definitions of terms and recognize that there is a 
great deal of overlap among them. This papsr does this for Tyler's four cate- 
gories. The definitions, however, are those of the present author. 
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Attitudes or opinions art personal judgments of the nature or value of 
something. As such, they are not facts, and they may be either broadly emo- 
tional ("The United States is the best country in the world.") or qu^si- 
intellectual ("The United States should have the largest navy in the world."). 
Similarly, in education, student attitudes may be emotional ("I hate school.") 
or have a strong and specific intellectual component ("I feel seniors should 
have a place where they can go and smoke."). Because of their emotional and 
intellectual nature, attitudes a~e very difficult to define. Shaw and Wright 
(I967) provide an extensive discussion of these definitional problems, discuss- 
ing such words as opinion, belief, and trait. Further, attitudes may be 
conscious and easily stated by the holder or virtually unrecognized and 
unverbalizable . 

Interests are areas of experience about which a person wishes to under- 
take further learning or performing. In this sense, an interest in something 
is a positive attitude toward it; an interest, then, is a kind of attitude. 
"Tennis is fun" may be the attitude which underlies an interest in tennis. 
To an extent, interests are more intimately connected to the self-image than 
attitudes. That is, attitudes, particularly quasi-intellectual ones such as 
whether the United States should have the world's largest navy, may change 
over time as the person learns new facts. Interests shift also but they are 
probably more stable components of the person than most attitudes are. Inter- 
ests in some ways arise from deeper psychological processes involving the 
establishment of the self and its fulfillment. 

Values are very broad attitudes or interests. The Allport-Vernon-Lindzey 
Study of Values, for example, describes a person in terms of si:: broad areas 
as originally proposed by Spranger (1929). These are: theoretical, economic, 
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aesthetic, social, political, and religious, Ihey are perhaps best thought of 
as broad classes of attitudes or interests; they are considered be dominant 
aspects of the personality , motivating drives, and fundamental governors of 
behavior. 

An appreciation is an achieved perception of the value or nature of 
somethin^^It is an attitude, a judgment, but it connotes a learned set of 
perceptions which precede the affective reaction. Like an interest, it is 
almost always conceived of as positive, although one can speak of an appreci- 
ation of the dangers of drugs or of reckless driving. 

Attitudes, interests, values, and appreciation are probably the four main 
aspects of educational measurement in the affective domain. Opinions, as sug- 
gested above, can be thought of as a subclass of attitudes. While other 
words offer potential clarity in some contexts, these four labels are a work- 
able and comprehensive base. 

In some ways, the self -concept and self -related evaluations do not fit 
neatly into the framework. The self is a very central concept, close to the 
core of the person. It is, in a sense, a learned appreciation. Although 
there are definite attitudes toward the self, it is probably wise to recognize 
this area as distinct from other appreciations and attitudes. The techniques 
for gathering information about the self are not essentially different from 
the techniques for learning about other, internal characteristics, but the 
degree of revelation is different, and the development of instruments in this 
area poses special challenges. 

Ethic a l A spects 

Measurement and instruction in the affective domain face some problems which 
the traditional cognitive and achievement areas do not confront a^ directly. 



ERIC 



BEST COPI AVAILABLE 

-6- 

Tho; relat.»; to the rights of persons to develop in a manner determined by 
their natural characteristics , by the kinds of persons they are. The extent 
to which schools attempt to influence agf.ressivity, for example, has ethical 
aspects. If a given s< «nt seeks counseling and is supported in it by the 
parents, then a counselor might test for aggressivity, identify it as the 
troubling area of the person, and work to modify it. But schools cannot 
enter the affective domain as "engineers" seeking to . -eate specific kinds of 
people who are valued by educational authorities. 

This ethical conflict between the need to give students self -benefiting 
attitudes and the danger of unnecessarily imposing values on them has surfaced 
in a number of contexts. The role of schools in th^ acquisition or rejection 
of religious values is a good example. Do the schools have the right to 
foster positive attitudes toward religion by permitting basically respectful 
pageantry during religious holidays? Even if the problems of recognizing 
religious minorities are surmounted, the rights of others are a sensitive 
issue in an 2gaiitarian society. 

A less difficult area but one that is not without its problems has to do 
with the attempts to influence students' attitudes toward drugs. There are 
deeply held emotional values running through all areas of the affective domain, 
and the recognition of the diversity of these values is an essential element 
of successful programs. Operations in the affective domain, be they measure- 
ment or instruction, must be constantly reviewed for ethical considerations. 

Relating to this is the. question of cooperation in measurement in the 
affective domain. As difficult as it is for the measurement worker to sur- 
render potential information, or to deal with self-selected subsets of his 
original, total group (because some elect not to respond to certain material), 



the best principle is one which clearly indicates to the respondent that. t}»e 
cooperation is optional, that there is a principle of privacy, and that no 
response is required if it will produce discomfort or conflict within the 
person. This is perhaps particularly needed in tests of self -concept. 

Most persons experience little difficulty in communicating, particularly 
if the measurements are retained with a reasonable degree of confidentiality. 
Schools, however, should be careful to institute adequate review procedures 
for all assessments in the affective domain, so that the rights of individu- 
als are preserved. Holman and Docter (1972) have a succinct discussion of 
these issues and offer some bibliographic references, of which one, 
Ruebhausen and Brim (1965), is devoted to legal issues. 

TECHNIQUES FOR ASSESSMENT 
Paper-and-F?ncil Approaches 

By far the most common approach to measuring affective characteristics is to 
offer the person some way of providing a self -report by choosing alternatives 
or endorsing responses in a printed form. In'- a measure of self -concept, for 
example, the statement "I am much less organized than the average person" 
might be provided, and the person asked to respond with a choice or endorsement 
of some kind. Broadly, then, this approach is a stimulus -response technique, 
in which the stimulus is some verbal input, and the response is the individual's 
endorsement or rejection of it. There is considerable variation in tho formats 
for such self -report surveys, both in the presenting of the stimulus and in the 
eliciting of the response. For example, in responding to the statement above 
about decree of organization, persons could simply indie :e "true" or "false/* 
or they might be given an opportunity to select from a somewhat broader scale 
of alternatives: 
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I STROKGLY AGKKti with this atatcmant 

I AjiWty! with this statement 

I am UNDECIDED about this statement 

I DISAGREE with thin statement 

I STRONGLY DISAGREE with this statement 

Frequontly, after an introductory set of instructions, these possible 
rcrponses are doded SA, A, U, D, and SD. The variety of formats is very 
great. A line dan be drawn offering a kind of scale, and the individual can 
place a check mark along this line 

I I I I I 

SA A U / D SD 

Again, a variation on the offering of true -false endorsements of self- 
concept statements is simply to ask for a check mark on a check list of self- 
descriptive traits. On the other hand, aot infrequently the response options 
are prepared on a separate answer sheet which can be scored by machine. In 
general, then, there are a large number of potentially workable formats and 
no overwhelming rationales for asserting the superiority of one to another t 
'ftexe Ijas been fairly extensive empirical work on the relative merits of some 
of the different methods, as in the study by Jackson, Neill and Bevan (1975) 
comparing forced-choice and true-false formats; but in general an instrument 
developer can proceed to use practical judgment without fear that some tech- 
nical rule will be violated. A practical and common sense adjustment of the 
general stimulus -response format to the needs and characteristics of the group 
being worked with is all that is needed. In adapting the methods to children, 
for example , such verbal categories ar. Afxee-Uniccidcd-DisaTrce can be 
replaced with the simple pictures 
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For children in the first four or five grades, responses through such a format 
may be more accurate and more nighty motivated than responses through more 
abstract verbal endorsements k An approach such as this is part of the 
Minnesota School Affect Assessment (Ahlgren, Christensen, & Lun, 1975). 

Where four or five choices are offered as optional responses, the method 
has similarities to the familiar multiple -choice tests which are widely used 
in cognitive tests. It differs, of course, because in the affective domain 
there is no "correct" answer, and because the optional responses tend to differ 
only in degree rather than in basic qualitative content as they do in cognitive 
tests. But there is in common a choosing among alternatives, a selection of 
options. In cognitive tests, giving each alternate wrong answer the proper 
qualities is demanding and skilled work. However, in most work in the affec- 
tive domain there is little need for highly specialized skills in order to 
prepare an appropriate response. Nor is there a great need for special train- 
ing in preparing stimuli. In order to assess attitude toward a school- 
expansion program, for example, simple statements along the following lines 

r 

can be offered: 

The proposed new school is too expensive SA A U D SD 

The proposed new school is too large SA A U D SD 

A swimming pool should be incorporated 

in the new school SA A U D SD 
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The creation of such stimuli chwandc common sense, a knowl^dfje or human- 
ity, and an ability to create reasonably clear language, but no *;roat technical 
expertise. This not to say that there are not good and had statements, or 
clear and unclear, and so on* But in much self-report work there is a straicht 
forwardness of communication that places the creation of adequate stimuli well 
within the ability of a teacher or counselor* 

A basic method, then, exists for affective measurement— a stimulus- 
rerponse method which is relatively inexpensive to prepare, which requires no 
very formidable technical training, and whicli can be inexpensively scored in 
most cases. One might hope that Lindquict f s pessimism cited earlier was pre- 
mature. However, as they say in the jokes, that is the good news; now for the 
bad news. 

The basic method of self-report by responses to statements is full of 
problems which complicate the interpretation of the results and which weaken 
the validity of the measures. For example, in considering interest assessment , 
Schwarz (1971) remarks: 

The problem in assessing interest has been uhat simply asking the 
individual about his interests in various curricula or occupations 
seldom results in the information desired. . . . The answers to 
direct questions necessarily are generalized responses based in 
part on erroneous or irrelevant impressions. . . . 

That is, the individual's perceptions are influenced by her or his own 
personal experience. The stimulus, then, is always somewhat ambiguous. Do 
you like journalism? The meaning of a "Yes" or "No" response to such a ques- 
tion can seldom be clear, for there is'an unwieldy bxjeadth to the concept of 
"journa- „»." Similarly, attitude-asressment stimul^ such as "The proposed 

I 

f 
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new school is too expensive." will receive similar endorsements from quite 
dif Cerent -minded people. One person may respond "Strongly 'Agree *' because 
education p«r se is held in low regard, another simply because some component 
of the plans— a new gymnasium or a vocational shop— is considered a frill. 
Inferences about attitudes drawn on the basis of marks or responses are 

i 

highly vulnerable. ! 

The solutions to these problems are not simple. While stimuli should be 
specific as possible, detailed breakdowns of stimuli can prove cumbersome. 
Analyzing journalism into free-lancing, sports < reporting, editing, cartoon- 
ing, opinion columns, and so on, can produce tedious dec is ion -making that 
taxes the information base of the respondent. Inferences simply have to be 
made on practical grounds. There are, however, other problems with direct 
self -report approaches besides the inherent, logical problem of the verbal 
ambiguity of stimulus and response. The so-called response sets reflect the 
influence cn the respondent of his or her awareness that the instrument is a 
communication about the self A common response set growing out of this 
awareness is social desirability. First proposed by Edwards (1957), this is 
the tendency to "put up a good front," to distort personal choices in the 
direction of what is considered socially ideal. Thus, interests in higher 
paying or prestigious occupations may be expressed not because one is, in 
fact, attracted to medicine or the law but because one cannot admit in the 
context of the affective test ijhat. these really are . not where the interests 
lie. Often, as Edwards pointed but, the individual is not conscious of her 
or his deception. We all like to perceive ourselves in the best way and we 
make the socially desirable response to please ourselves as much as others. 
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The techniques for combating rem*: of those problems offer only a limited 
rucccss. One of the major strategics for guarding against social desirabil- 
ity as a response set, for exampJr, has been the forced-choice technique. 
In this technique, the stimuli are not presented alone but in groups, and 
the responses usually consist of identifying the extremes of the set, Foi* 
example, in assessing interest in school subjects, one might create rets of 
three subjects and force the respondent to indicate a "most preferred" (M) 
wl a "least preferred" (L) subjects 

Physics M L 

History M. L 

English M L 

Physical Education M t 
Woodworking M L 

Home Economics M L < 

If the social desirability of' the stimuli is determined beforehand, 
through judgments by raters, all of the stimuli in each set of three can 
have about the same social desirability. The choices, then, are believed to 
be more securely based on actual preferential feelings about the stimuli. 

However, it has been demonstrated that sophisticated test takers can 
still distort responses tven in the forced-choice approach, and, further, 
that the scores that are reached by adding up the results have a somewhat 
negative characteristic: Che judgments were all relative rather than absolute 
and so the results reflect more the rank order of the stimuli than their 
absolute level. Further, the scores commonly have a built-in infiuerce o.i 
tiieir intercorrelations, called "ipsativity," which makes them somewhat diffi- 
cult to interpret in standard statistical analyses. 

14 
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The forced -choicro approach, then, is a logical response to certain 
problems of response sets, but it is so imperfect that, in balance, it would 
not corranonly be recommended to nonprofessional test constructors. The work 
of assessing social desirability or other stimulus characteristics beforehand 
will not often seem to be worth the results. 

Several types of stimulus -response scales are so well known as to require 
specific mention: The Likert, the Thurstone and the Guttman approaches. Both 
the Likert and the Thurstone approaches present stimuli singly rather than as 
forced choices. In the Thurstone approach, however, the stimulus is simply 
checked or endorsed as true of the respondent or not true. In the Likert 
approach, the response is given on a graded scale of (usually five) categor- 
ies, such as Strongly Agree, Agree, Undecided, Disagree, and Strongly Disagree. 
These differences in response methods lead naturally to differences in scoring 
methods also. The Likert scale creates different weights for each possible 
response on its scale (say, 5 for Strongly Agree, h for Agree, and so on) and 
adds up a total score of all the weights for the responses selected. The 
Thurstone scale determines a unique weight for each stimulus statement by 
asking judges beforehand and then takes the median value of the weights of all 
the statements selected. Between them, the Thurstone and Likert scales 
account for the bulk of instrument development in education and psychology. 
Thurstone procedures require somewhat more elaborate preliminary development 
and statistical knowledge. In the long run, however, both are stimulus- 
response scales, differing more in the nature of the response and the numeri- 
cal value attached to it than in anything else. A practical description of 
Thurstone procedures is offered in a paper by Murray (1971 )> which is avail- 

\ 

v 

able as an ERIC document* 
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A Guttman scale is another concept in this field. This approach 
assumes that an ideal scale will have the property that any individual who 
responds positively to a higher -ranking stimulus will also respond positively 
to a lower-ranking one* Let us suppose that 10 statements are prepared as 
stimuli. These are specifically selected to vary in the level or intensity 
of attitude they reflect, and the respondent is asked to indicate those 
which he or she can personally endorse. In theory, if one knows the highest 
level. statement which is endorsed, one knows that a) no endorsements of 
higher level statements were given, and b) all lower-ranking statements 
were endorsed. For a number of reasons, people are seldom this consistent 
in responding to statements, and a Guttman scale is an ideal not often 
attained in practice. Like the Thurstone approach, it requires a consider- 
able amount of rather complicated statistical work. 

For most practical purposes, then, educators who need to develop affec- 
tive scales can probably rely upon Likert scales for attitude assessment as 
the most convenient approach. 

The measurement of interests is typically approached in a similar, 
stimulus-response way. The Strong Vocational Interest Blank (SVIB), for, 
example, offers as stimuli the names of occupations. Responses N ar^through^_^ 
a three-point scale of Like, Indifferent, or Dislike rather than a "standard" 
Likert five-point scale, but the approach has many basic similarities. The 
Strong instrument differs in that it derives its scores by a system of weights 
computed by preliminary sampling rather than assigning say, 2, and 1 to its 
three points and then adding them all up. The Strong approach is sufficiently 
co.-nplex to require a special discussion, but in terms of the format for stimu- 
lus and response, the Strong Vocational Interest Blank resembles the use of 

16 
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LiK'-vt, or r.racU'd-n.'sponse scales. The weights in the SVIB are derived by 
comparing Wic response of specific occupations vith people in general. For 
example, an item might bo evaluated as follows with respect to bakers: 

Working with my hands 

Responses 
Responses of bakers 

itesponoes of people in 
general 

Difference 

Weight 

This approach is consistently used throughout, with the difference in 
the percentages being used to assign weights. The total raw score is the sum 
of all the weights, positive or negative. 

The Strong approach to weighting is interesting but demands large popula- 
tions; the rationale essentially focuses on statistically significant differ- 
ences among groups, and could not often be successfully used in developing 
measures for use in a given institution. Similarly, the Kuder scales for 
interest measurement, requiring the respondent to select the most attractive 
and the least attractive of a set of three activities, does not offer an 
easily reproducible technique for individual institutions. 

A potentially useful technique for institutional researches is the 
Semantic Differential. It derives this high-sounding name from its origin 
as a research tool for psycholinguists (Osgood & Suci, 1955). Osgood and 
his associates were interested in problems of the meanings of words. They 
deviccd a format for securing Judscwnts and feelings about words. For example, 
the v;ord WOLF might be presented thi<i way. 



Like Uninterested Dislike 
55 percent 35 percent 10 percent 

50 55 55 

+ 25 0 -25 

+1 0 -1 
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WOLK 



kind : 
big : 
sweet : 



: cruel 
: small 
: sour 



Each of the pairs of opposite words creates a kind of scale, ranging from 
one opposite pole to the other. Thus, there are intermediate points between 
\-.lnd and crruul." The instrumsnt designer offers a number of intervals 
twU-ncn the poles as potential choices, and the respondent selects an interval 
on the scale. The use of seven intervals is a fairly common practice, although 
no specific number is mandatory. Each interval is assigned a weight which, for 
convenience, is a whole number; thus, if the respondent checks the interval 
nearest "kind," this might be scored as a "7" on that scale, with the interval 
nearest to "cruel" being scored 1. The individual's score is the sum of all 
these scale values. 

The test constructor has to know certain things in order to develop a 
successful Semantic Differential and score it. The various scales have to be 
able to be added together. if the total score is to have meaning. That is, 
they have to correlate, so that there is a tendency for those who think wolves 
are kind to think other positive thoughts about them. It is appropriate to 
find out which scales go together by doing a statistical analysis of the / 
results, weeding out scales that don't contribute but deriving separate scores 
for tinse that offer independent information. 

The use of the Semantic Differential in assessing attitudes in educational 
settings is exemplified by "Semantic Differential for Measuring Attitudes of 
Elementary School Children Toward Mathematics" (Scharf, 1971). Below is a 
sample of the stimuli and the response sc"ales: 



(V 
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Taking a Math Tent is: ' 
Very : Sort of : Ne ither : Sort' of : Very 
^ : : : ' : good 



HAPPY 



SAD 



The same set of response scales may be used to assess additional dimen- 
sions of the subject. Thus, Scharf studied ouch other stimuli as "My Math 
Class is" and "Doing Math is." 

A Semantic Differential is easy to construct, and most respondents find 
it intuitively easy to understand what is wanted. An interesting feature of 
this approach is that the respondents v. ill often tolerate quite unusual 
scales, make meaningful responses, and the responses to these scales can 
offer useful information. This has to be checked by empirical methods, of 
course, but after a set to respond has been developed, one can ask where the 
concept FATHER stands on a scale from Valuable to Worthless and get a plausi- 
ble response, even though it is rare to hear people say "My father is very 
valuable!" Similarly, in a Semantic Differential reflecting attitudes toward 
a home room, one could create the following scales: 

My Home Room is 

QUIET • NOISY 

CROWDED ROOMY 

H0T COLD 

DUSTY CLEAR 



KIND 



CRUEL 



It is frequently possible, in the context of a number of judgments, to 
have scales such as Kit© or CRUEL be meaningful to the respondents and to 
offer a sufficiently oblique avenue for response that somi of the defensive 
response sets are avoided. 
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Becaure of its ease of construction and its acceptability to respondents 
the Semantic Differential is a very useful technique. It is fairly widely 
used in educational research. Descriptions of techniques for constructing 
Semantic Differentials are found in Kerlinger (1967) and Maguire (1972). 

In spite of the formal differences between them, Likert Scales and 
Semantic Differentials have a broad commonality as stimulus -response scales. 

.1 

£ach calls for a response to a stimulus by selecting from a graded series of 
options, and it ought to be possible to secure somewhat the same results by 
adapting one technique to the other. For example: 



My homeroom is quiet. 


SA 


A 


U 


D 


SD 


My homeroom is crowded. 


SA 


A 


U 


D 


SD 


My homeroom is hot. 


SA 


A 


u 


D 


SD 


My homeroom is dusty. 


SA 


A 


u 


D 


SD 


My homeroom is kind. 


SA 


A 


u 


D 


SD 



This Likert-type equivalent to the Semantic Differential given above :>ught 
to provide much the same information. . 

It has beer, suggested that Semantic Differential scales be provided with 
adverbial descriptors, as follows 

Weak 8 5 J : : : : Strong 

extremely quite slightly slightly quite extremely 

Thus, Wells and Smith (1963) found that there was greater differentiation 
and an avoidance of end points when the adverbial modifiers were included. 
Such additions underscore the similarity to the Likert approach. 

There is a verbal efficiency to the Semantic Differential, however, that 
probably gives it an edge when the sought -for attitude can be captured in 
words or brief phrases which satisfy the requirements for a scale or opposite* 
With more complex concepts and opinions, such as "My homeroom is an excellent 
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place to l.'arn what's goint; on in school," Likcrt-typo stimuli probab.ty have 
the edge. 

Likert scales or the Semantic Differential are workable techniques, but 
the interpretation of the scores they yield has been essentially normative. 
That is, if one creates a ten-statement Likert scale of attitudes toward 
mathematics, the best basis for evaluatine it would seem to be normative by 
giving the scale to some students and studying the responses, letting statisti- 
cal rarity guide the assessment of what is or is not important. Similarly, 
responses to a Semantic Differential of 10 scales would be handled in this 
way. The increasing attention to criterion-referenced measurement in the 
areas of skills and knowledges, however, has implications for affective 
measurement as well. A careful review of the instruments, considered in the 
light of the context in which they are administered and the decisions to 
which they should contribute, may suggest a critical level or levels, and a 
knowledge of such levels may help in the design or redesign of the instrument. 
Self -concept measures, for example, may be evaluated by predetermined evalua- 
tive criteria established by teachers and counselors. It is not easy to 
reach or defend such criterion levels, but it is probably an important safe- 
guard against the passively accepted nonrational standards which can result 
from an overly timid reliance upon norms. 

Similarly, the individual stimuli or statements in an affective instru- 
ment are often worthy of a careful review. A total score, with its abstract 
label, is more reliable and probably more valid than the individual components, 
but the content of the individual stimuli can often give insight as to vhere 
to go from here. Almost certainly, the individual stimuli themselves can be 
analyzed further, people who oppose the new gym an:i students who don't like 
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math often have more to say on the* sublet. It is possible to do backup 
sampling or chocking to bore further into the nature of the situation. 

Measures of attitudes, values, and interests are often of greatest 
interest as descriptions of groups rather than individuals. A convenient 
way to display such information is to show the proportion of the group that 
selects one of the responses. This approach is appropriate for either a 
Likert-type scale or a Semantic Differential. This way of formulating results 
is often most interesting because of the contrasts it affords between sub- 
groups with different abilities. The following example contrasts high school 
juniors and seniors with respect to attitudes toward dress code: 
The dress code in our high school is too strict. 

SA A U D SD 

Juniors ho 32 IB 10 0 

Seniors 28 % &) 16 lo 

Such contrasts of subgroups are often powerful contributors to an under- 
standing of the social context within which attitudes operate. Further, they 
are often of greatest interest to the respondents themselves. Assessment in 
the affective domain is usually intrinsically interesting to the members of 
an institution, for it functions as a sort of mirror of the social context. 
Announcing the results of questionnaires and surveys, analyzed by subgroups 
with which people can identify— for example, administrators, faculty, students- 
no t infrequently leads to the pinpointing of areas of difference which may be 
obstacles to communication. In a sense, affective results are somewhat freer 
of the ego threat that often lies in achievement scores; people will talk 
about them more. 
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From the foregoing review of major papor- arid-pencil strategics, it should 
be clear that affective assessments depend in large measure on statistical 
operations. Establishing Thurstone scale values, determining Likert scale 
internal-consistency, and defining the clusters of semantic"difforential 
scales all require some basic statistical operations. There is a danger in 
this need for analysis, however. Hie affective domain is extraordinary in 
its complexity and multidimensional nature. We do not know any grand design 
for the affective domain, and interests, values, appreciations can be organ- 
ized and subdivided in a variety of ways. It is possible to literally explode 
the interest domain by factor-analytic methods, st^dTvIdtnsH^nto a larger 
and larger number of increasingly specific interests. 

The moral in this is to resist, the temptation to overmanipulate the 
data. Psychological constructs are typically fragile things, often depending 
on scientific populations and circumstances in order to demonstrate them. 
The institutionally based worker should keep in mind the decisions or needs 
which confront the institution and the logic of the data as they relate to 
these requirements. It does no good to offer a fifteen-factor analysis of 
rather tenuously labeled qualities such as "Attitude toward Science" or 
"Attitude toward Punctuality" if what is naeded is some general assessment of 
the degree to which the students feel positively about the school. Non^cgni- 
tive and affective assessment contains this pitfall, and instrument users an^ 
developers should be aware of it. 

Other Ap proache s / 

pencil-and-paper affective self -report instruments are the basic techniques 
for assessment, but there are a number o, 1 others worth mentioning. This 
paper will focus on three: Q-sorts, interviews, and unobtrusive measures. 

23 



BEST COPY AVAILABLE 



Kadi of U-.-AC is a somewhat- more involved procedure than the papor-and-psncil 
approaches. Q-sorts and interviews tend to focus on one person at a time, and 
unobtrusive measures may demand rather elaborate recording devices. In the 
Q-sort technique, the experimenter asks the respondent to place a collection 
of stimuli in order from one end of a continuum to another. For example, 
the stimuli can be adjectives, and the respondent can rank them along a 
continuum from "most like me" to "least like me. » The ranking is most often 
done by patting the stimuli into a distribution, and the distribution is pre- 
scribed in advance. Thus, if there are 10 adjectives, the respondent may be 
told to put them into 5 piles of 1, 2, h, 2 and 1 adjectives each- Thus, 
respondents select the one adjective that is most like them, two more that 
are next most like them, four middling adjectives, and finally the next-to- 
least pile of two and the single "least like me" stimulus. There are disputes 
among Q-sorters about what kind of instructions to give the respondent concern- 
ing the number of piles and the number of stimuli in each, but whatever the 
approach, the method yields a sorting of the stimuli along a quantitative 
continuum, and hence its name. 

The Q-sort method has its most interesting properties in the emphasis 
it places on individuals. In instructional evaluation, for example, a pre- 
course Q-sort of attitude statement can be compared with a postcourse Q-sort, 
and the similarity between them assesses as a correlation. Similarly, it is 
common practice to analyse group Q-sorts so as to locate clusters of similar 
people rather than the more familiar clustering of stimuli into scales. 
People typically enjoy a Q-sOrt if there aren't too many statements; as more 
get aided, or as Uu ; rules as to piles and the numbers in them «*e?t complex, 
it becomes a less attractive method. 
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Q-r.orts aro attractive in that th.y foster hypothesis tooting. T..o 
experimenter can try to devise a set stimuli which he believes the subjects 
will sort in a predictable way, and then tost this hypothesis. Attitudes 
toward mathematics, for example, may be hypothesized to be distributed in one 
way for successful students and in another for unsuccessful students. 

Two contrasting Q-sorts of self-descriptive adjectives, one from a person 
who basically likes himself, one from a person who dislikes himself, might 
look 3 ike this: 

Sort 1 Sort g 

Likes Himself Dislikes Himself 



Most true of me Able 



Unimportant 



Good Passive 
Sorry Kind 

Energetic Unsociable 

Kind Energetic 

Interesting Friend^ 

Friendly Strong 

Passive Good 
Unsociable Interesting 

Least true of me Unimportant ' Able 

It is possible to calculate correlations between such sorts, for they 
are basically elaborate rankings of the adjectives. The correlation between 
these two individuals would be highly negative. The study of similar correla- 
tions based on an individual over time is often useful as an index of personal 
stability or change. 

The interview an expensive, time-consuming, and in some ways frustrat- 
ing unreliable, approach to affective assessment. It is extremely rich and 
full-dimensioned in the data it offer* to the interviewer. Itf is superior to 
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pap';r-and-p..-nc.Ll approaches in that it permits the correction of nusimd'T.-.tund 
ingn and the u::c 'of the interviewee's natural language. Its weaknesses lie in 
the pressure it puts on the respondents to put their best foot forward, to 
conceal the loss "noble" aspects of self from the interviewer. 

Interviews have to be planned, interviewers have to be trained. Match- 
ing interviewers to interviewees to reduce incompatability is a good practice. 

an appraisal technique for determining attitudes and interests, the inter- 
view can produce a wealth of information of subjects, with questions devised 
on the spot by the interviewer after considering previous answers. 

Coopsrative subjects often volunteer a great deal of information which 
the experimsnter failed to inquire about. To facilitate this effect, inter- 
viewers should provide the interviewee with as full an account of the purpose 
of the interview as can be given. 

The greatest difficulties with the interview are the very si2e of the 
data it offers and the fact that you need to quantify this information in 
some way. It is, in a sense, a less -structured stimulus-response model in 
which there are obvious timuli (interviewer behaviors) and obvious responses 
(interviewee behaviors), and thus a chance to make inferences about affective 
characteristics, but the stimuli and the responses are so numerous and complex 
that it's difficult to know how to organize the information. Structuring, in 
the sense of predetermining most or all of the interviewer questions, helps 
this by controlling interviewer behavior, but there are still real difficulties 
in summarizing the information. 

Nonetheless, interviewing is a sensible and valuable technique for 
affective assessment with many important by-products in torms of the human 
quality of the direct communication. Particularly where affective assessment 



26 



BEST COPY AVAILABLE 

ir under 4 ".ken in order to determine the nature or an institutional environ- 
ment, interviewing should be part of ' the overall assessment strate&y, 

Webb and his associates (1066) produced a book devoted to this topic, 
fcr which the best earlier statements were contained in Selltlnc et al. (1959). 
Essentially, this approach departs from the stimulus-response methods of the 
earlier techniques and seeks to make inferences about affective behavior by- 
collect inc data about everyday behavior. It is a challenge to the investigator 
ani at the same time a corrective for some of the more indirect technqies. 
A common example of unobtrusive measures is how close people stand to each 
other when they talk, as a measure of their mutual acceptance of each other. 
Another example is the amount of audience coughing during theater performances, 
as a measure of interest in the play. Interest in museum displays is assessed 
by the wear and tear on the tile floors in front of the displays. Archives 
are searched for records of class attendance, book usage at a library, and so 
on, in an effort to draw inferences as to interest. 

The method avoids some of the problems of alteration of response because 
of awareness of being assessed. But, because it is indirect and logical, it 
is open to errors of inference. Behaviors such as checking out library books 
are complexly determined and attributing a circulation increase to a poster 
campaign on reading may be entirely an error. Further, there are ethical 
aspects to the approach. Can you eavesdrop on student conversations? Can 
you check the wastebaskets after class? Watching behavior covertly may 
sovnd scientific but it can be dangerously close to snooping. 

Nonetheless, the unobtrusive methods are to be recommended. They force 
investigators to think of the behavioral consequences of affective states, 
and in so doing may help them to devise real-world measures which are more 
intuitively satisfying than the results of paper and pencil surveys. 
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Projective techniques arc widely used in the study of personality char- 
acteristics. The Thematic Apperception Test, in which the subject tells 
stories based on a series of drawings, is a good example. They may be 
extended to attitude measurement , however, although little formal work along 
these lines has been reported. Perhaps the most promising format for atti- 
tude assessment among the projective techniques is the sentence completion 
approach. Subjects are asked to supply completions for attitude -relevant 
sentences such as the following; 

The greatest social need of our time is . . . 

The greatest problem in dealing with 
minorities is . . . 

The greatest difficulty with such approaches lies in their unstructured 

format. Ycu can learn a great deal about attitudes, but you are at the 

mercy of the respondent, in some ways. Thus, responses to the "greatest 

social need of our time" will cover a gamut of concerns in which a given 

one, such as socialized medicine, may be very infrequently mentioned. The 

methods lend themselves more to exploring attitude domains, learning their 

likely boundaries, and are not really suitable for hypothesis testing. 

Summary 

The increasing interest in the affective domain in recent years has 
been met by a slow but steady expansion of technique and rationale in this 
area. Efforts have been made to formulate the definitions and to organize 
the logical structure that is essential for measurement. While much remains 
to be done, much has been accomplished. A number of specific strategies for 

assessment are available, ranging in complexity and rationale from paper-and- 
pencil scales to the unobtrusive methods and projective techniques. 
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There methods are seldom totally satisfying. While the development of 
instruments has a straightforward logic and requires little technical theory 
or body of knowledge, the gist of the methods is perceivable by the respondent, 
arid the distortion of responses, either consciously or unconsciously, is the 
greatest single problem in working with them. 

More so than in the area of cognitive achievements, there are ethical 
considerations to measurement in the affective domain. Attitudes, interests, 
values and appreciations are characterized by their affective component; the 
result is that communication about them is sometimes uncomfortable. Further, 
the establishing/ of objectives in this area is complicated by the problem of 
imposing values on others, of rewarding or recognizing certain types of persons 
at the expense of others. A maximum openness in the sharing of information 
helps to relieve this ethical tension, and often secures the kind of respondent 
cooperation which is desirable, considering the limitations of the techniques. 

As important as the problems of measurement and assessment are, it is 
well to remember the cognate problems in the areas of instruction and curricu- 
lum. It is one thing to establish objectives in the affective domain; it is 
not so easy to institute sensible procedures for attaining them. Much progress 
in assessment will doubtless be made in the future, but it is likely that the 
greatest gains in the logical and ethical aspects of work in this domain will 
corns through related gains in methods of instruction. 
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